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Abstract:  A  new  classifications!  scheme  is  presented  which 
is  consistent  with  Flynn's  taxonomy  but  is  more  expres¬ 
sive.  The  crucial  idea  is  to  recognize  that  a  reference 
stream  is  composed  of  both  values  and  addresses;  their 
treatment  exposes  critical  features  of  an  architecture.  This 
insight,  together  with  the  accompanying  formal  mecha¬ 
nism  bJ.lt  on  top  of  it,  enables  a  large  variety  of  recently 
developed  (since  Flynn's  work)  machines  to  be  distinguished, 
including  VLIW,  multigauge,  systolic  arrays,  and  the  Con¬ 
nection  Machines.  Though  the  resulting  taxonomic  struc¬ 
ture  is  illuminating,  the  most  important  result  of  the  clas¬ 
sification  is  the  discovery  that  synchronous  execution  is 
NOT  a  defining  property  of  computer  architectures,  but 
is  a  derived  property,  a  consequence  of  other  architectural 
features.  The  evidence  for  this  result  and  the  consequences 
for  machine  classification  are  presented. 

1  Introduction 

In  1966  Flynn  [1]  introduced  his  classification  of  comput¬ 
ers.  This  taxomony  proved  to  be  very  useful,  giving  us 
terminology  like  SIMD  and  MIMD  that  endures  to  this 
day.  The  taxonomy,  however,  has  long  been  described  as 
too  coarse,  unable  to  distinguish  between  computers  that 
seem  to  computer  architects  to  be  quite  different.  Though 
other  classifications  have  been  offered  (2-5|,  the  fact  that 
Flynn’s  classification  has  lasted  for  so  long  without  being 
replaced  and  enhanced  is  a  testament  to  the  difficulty  of 
discovering  something  better. 

In  this  paper  a  new  taxonomy  is  presented  for  syn¬ 
chronous  parallel  computers.  It  has  no  pretentions  of  be¬ 
ing  complete  nor  of  capturing  all  features  of  synchronous 
parallel  computers.  The  taxonomy  does  clarify  important 
distinctions  among  recently  developed  parallel  computers, 
such  as  the  VLIW  machines,  multigauge  machines  and  cer¬ 
tain  SIMD  machines  such,  as  the  Connection  Machines. 

The  key  idea  of  the  taxonomy  is  to  quantify  the  compo¬ 
nents  of  the  fetch/ execute  cycle  that  process  I-streams  and 
D-streams.  To  malm  fine  distinctions  among  machines,  one 
must  separate  these  reference  streams  into  their  address 
and  value  components,  because  addressing  and  value  pro¬ 
cessing  an  crucial  features  by  which  machines  differ. 

Using  this  kind  of  analysis,  a  taxonomy  is  constructed. 
Many  of  the  machines  that  are  placed  into  different  classes 
hen  would  have  been  classified  by  Flynn's  scheme  as  SIMD, 
so  this  approach  permits  finer  distinctions  to  be  made. 
Only  a  small  number  of  classes  have  been  described,  and 
only  one  or  two  machines  per  class  have  been  identified. 
Thus,  then  remain  substantial  opportunities  for  further 
research. _ 

‘This  research  funded  in  part  by  the  Office  of  Naval  Research  Con¬ 
tract  N00014-M-K-0284.  National  Science  Foundation  Gram  CCR- 
841M78  and  Air  Force  Office  of  Scientific  Research  Contract  M-0023. 


Perhaps  the  most  important  result  derived  from  the 
taxonomy  concerns  the  property  of  “synchroneity".  The 
author  and  apparently  many  other  researchers  have  treated 
synchroneity  as  a  primary  classificational  property:  we  have 
spoken  of  “the  synchronous  vs.  the  asynchronous"  ma¬ 
chines  as  if  this  should  be  an  important  way  to  distinguish 
between  machines.  It  is  not.  The  criterion  used  r  clas¬ 
sifying  machines  in  this  taxonomy  tells  when  a  machine 
must  have  all  of  its  instructions  start  at  the  same  time, 
and  when  it  is  not  necessary.  This  determination  is  based 
on  how  the  machine  addresses  and  processes  instructions 
and  data.  Machines  which  must  begin  all  instructions 
at  the  same  time  will  automatically  be  synchronous;  for 
those  machines  that  need  not  begin  their  instructions  at 
the  same  time  it  is  an  “engineering  decision’’  whether  to 
make  them  synchonous  or  asynchronous.  Thus  the  qual¬ 
ity  of  being  synchronous  is  a  derived  property:  A  machine 
must  have  it  because  of  other  features,  or  it  is  a  noncritical 
implementation  feature. 

2  Preliminaries 

A  reference  stream,  S,  of  a  computer  is  a  finite  set  of  infinite 
sequences  of  pairs, 

S  =  {(«,<  t,  >)(«j  <  tj  >)  .  .  ., 

(h,  <  u,  >)(hj  <  ut  >) . 

(c,  <  u,  >)(Cj  <  V|  >)  .  .  .} 

the  first  component  of  each  pair  being  a  nonnegative  in¬ 
teger,  called  an  address ,  and  the  second  component  being 
an  n-tuple  of  nonnegative  integers,  called  values ,  such  that 
'  n  is  the  same  for  all  tuples  of  all  sequences.  An  element 
of  a  reference  stream  is  called  a  reference  sequence.  An 
[-stream  is  a  reference  stream  whose  values  are  interpreted 
as  instructions',  a  D-stream  is  a  reference  stream  whose 
values  are  interpreted  as  data. 

The  interpretation  of  these  definitions  is  simple.  The 
elements  of  reference  sequences  are  address,  value  pairs, 
the  values  simply  being  the  contents  fetched  from  (or  stored 
to)  memory  at  the  address.  A  sequence  of  elements  can 
be  thought  of  as  the  history  of  the  addresses  and  values 
moving  between  a  processor  and  its  memory  space.  An 
I-stream  is  made  up  of  a  finite  set  of  these  sequences,  the 
number  depending  on  bow  many  instruction  sequences  1  he 
machine  can  process  at  one  time;  and  a  D-stream  is  made 
of  a  finite  set  of  data  sequences,  the  number  dependin*  on 
how  many  distinct  operations  the  machine  can  perform  a : 
one  time. 

Although  the  I-streams  and  D-streams  have  been  be 
fined  in  an  intuitive  manner,  their  form  is  not  convenient 
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for  analysis.  Accordingly,  the  following  reassociation  must 
be  performed.  Let 

S  =  {(«,<  ti  >)(a,  <  t,  >)  .  .  ., 

(bi  <  ui  >)(6j  <  tij  >) . 

(ct  <  Vj  >)(cj  <«*>)...} 

be  a  reference  stream.  Define  two  sequences:  The  address 
sequence  ofS,  denoted  Sa,  is  a  sequence  whose  element 
is  a  tuple  formed  from  the  addresses  from  the  elements 
of  each  sequence  of  5, 

Sa  =<  aibia  >,<  aj6jca  >,  .  .  . 

and  the  value  sequence  of  S,  denoted  Sv,  is  the  sequence 
whose  element  is  a  tuple  formed  by  concatenating  the 
value  tuples  from  the  t*  elements  of  each  reference  se¬ 
quence  of  5, 

Sv  =<  tiU»Vi  >,  <  taujoj  > . 

Notice  that  although  a  reference  stream  is  a  set  of  se¬ 
quences,  address  and  value  sequences  are  just  sequences 
of  tuples. 

It  is  possible  to  interpret  these  definitions  as  grouping 
the  corresponding  addresses  and  corresponding  value  tu¬ 
ples  of  a  reference  stream  5  into  Sa  and  Sv,  respectively. 

Let  Sx  be  a  sequence  of  n-tuples;  the  width  of  the  se¬ 
quence,  uj(Si)  **  n. 

Proposition  1:  Let  5  be  a  reference  stream  with  n-tuple 
values,  then 

w(Sa)  =|  S  |  and  w(Sv)  -  n  |  S  | 

where  |  X  |  denotes  the  cardinality  of  the  set  X. 

A  computation  is  a  pair  (I,D),  where  I  is  an  I-Stream 
and  D  is  a  D-stream.  Computers  are  classified  by  the  com¬ 
putations  they  execute.  A  computer  executes  the  computa¬ 
tion  (IT))  provided  it  presents  u>(/*)  instruction  addresses 
to  memory  to  be  fetched  simultaneously,  it  decodes  and 
interprets  w(Iv)  instructions  simultaneously,  it  presents 
w(Da)  operand  addresses  to  memory  simultaneously,  and 
it  performs  w(Dv)  operations  on  distinct  data  values  si¬ 
multaneously.  The  computer  is  described  by  the  notation 

/»(/«)w(/„)  Du>(Da)viDvy 

Notice  that  we  speak  of  the  computation  executed  by  a 
computer.  This  is  a  definitional  simplification,  and  is  suf¬ 
ficient  since  any  desired  sequence  of  instructions  or  data 
is  a  subsequence  of  the  infinite  streams  of  the  computa¬ 
tion.  Observe  the  relationship  between  this  point  and  the 
Enumeration  Theorem  of  recursive  function  theory. 

Let  d\,di,ds  and  d«  be  predicates  called  class  desig¬ 
nators;  then  a  machine  is  said  to  be  member  of  the  class 
denoted  by 

Uxd^ipis 

if  and  only  if  dl(w(/«)),dj(w(/v)),  dsMDa)),  and  d4(w(f)u))- 
(Commas  may  occasionally  be  inserted  between  the  sub¬ 
scripts  for  clarity.) 


Example  &  By  appropriating  for  our  class  designators 
Flynn’s  “s"  and  “m”  to  denote  the  predicates  “is- one’’  and 
“is-manif,  it  is  possible  to  classify  some  familiar  machines 
using  the  mechanisms  developed  so  far. 

Let  a  von  Neumann  machine,  which  Flynn  classified  as 
SISD,  execute  the  computation  (I,D).  From  his  classifica¬ 
tion  we  have 

I /M  0|=i. 

By  Proposition  1,  then,  we  have 

w(/a)  =  w(Da)  -  1- 

Moreover,  since  instructions  are  decoded  serially,  w(Iv)  = 

1  and  since  they  are  executed  serially,  w(Dv)  =  1-  There¬ 
fore  the  von  Neumann  machine  is  described  as 

A,  i0i,i- 

It  is  classified  with  the  present  notation  as 

I ss  D  S3 

since  the  predicate  'Vis  true  for  all  four  widths. 

Now  consider  two  machines  that  Flynn’s  taxonomy  lumped 
in  the  SIMD  category,  the  MPP  and  the  Illiac  IV.  (Ignore 
for  the  moment  the  fact  that  these  have  bit  serial  and  word 
parallel  PEs,  respectively.)  The  single  instruction  stream 
means  |  /  |  =  1  for  both  machines.  By  the  same  reasoning 
just  used  for  the  von  Neumann  machine,  the  instruction 
streams  for  both  machines  are  described  as  /**. 

For  data,  consider  the  MPP  first.  Recall  that  the  MPP 
controller  broadcasts  the  same  data  memory  address  to  all 
PEs  [6],  and  so  the  machine  has  a  single  D-stream  in  our 
terminology,  |  D  |  =  1.  However,  a  value  is  fetched  from 
each  PE  memory,  so  the  values  of  this  stream  are  16384- 
tuples.  Thus, 

i o(Da)  —  l  end  u>(Dv)  —  16384, 

which  certainly  satisfies  the  ‘multiple’’  class  designator. 

So,  the  MPP  is  described  as 

A,  101, 16384 

and  is  classified  as 

IssDsm- 

The  MPP  has  a  “multiple  data  stream”  but  the  multiplic¬ 
ity  applies  only  to  the  data  values,  not  to  the  data  value 
addressing. 

For  the  Illiac  IV  on  the  other  hand,  the  controller 
broadcasts  a  base  address  to  all  PEs,  each  of  which  may 
produce  its  own  address  by  adding  in  the  contents  of  a 
local  index  register  [7).  This  means  that  |  0  |  =  64;  there 
are  64  operand  address  streams  simultaneously  produced 
by  the  machine  and  each  of  them  references  a  single  value, 
i.e.  each  data  address  is  associated  with  a  1 -tuple.  Ac¬ 
cordingly, 

i c(Da)  =  w(0v)  * 


T 
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and  the  llliac  IV  is  described  as 

h,l  DMM 

which  places  it  is  the 

IsaDmm 

class.  It  has  “multiple  data  streams'  too,  but  its  mul¬ 
tiplicities  are  for  addressing  and  data  reference.  Clearly, 
the  present  taxonomy  retains  the  distinctions  achieved  by 
Flynn,  but  it  is  also  capable  of  making  finer  distinctions. 


3  Discussion 

It  is  possible  to  give  an  intuitive  interpretation  to  much 
of  the  foregoing  formalism.  The  key  idea  is  to  recognize 
that  the  formalism  quantifies  funtional  components  of  aa 
fetch/execute  cycle.  Thus,  the  machine  described  as 

IavDdv' 

presents  instruction  addresses  to  memory  for  a  threads  of 
control  (presumably  from  a  PCs  but  data  flow  computers 
qualify  aa  well);  it  receives  o  different  instructions  back 
from  memory  at  once  and  interprets  them;  it  presents  o' 
different  operand  addresses  to  memory  for  data  values,  and 
it  receives  r/  data  values  back  and  operates  upon  them 
concurrently.  So,  when  the  MPP  is  described  as 

A,  lDl,  16384 

it  is  immediately  obvious  that  its  PEs  all  use  the  same 
address  for  accessing  their  operand  values,  even  though 
they  are  capable  of  independently  performing  operations 
on  the  resulting  data. 

The  interpretation  of  the  classification  is  intended  to 
carry  the  implication  that  if  the  a-tuple  of  values  <  tj  > 
is  received  from  memory  upon  presentation  of  address 
then  the  machine  is  capable  of  procening  all  n  elements 
at  once  This  applies  to  both  instructions  and  data.  So 
even  if  a  computer  makes  a  memory  reference  to  address 
a,  and  fetches  k  words,  perhaps  to  cache  them,  if  it  only 
processes  one  of  them,  then  »»lia  this  model. 

Finally,  notice  that  our  classifications!  scheme  is  a  com¬ 
pletely  formal  system  with  a  precise  meaning.  Its  utility  in 
classifying  computers  depends  entirely  on  our  interpreting 
this  formalism  as  meaningful  Though  it  is  possible  for  two 
scientists  to  differ  in  their  interpretation,  and  thus  to  differ 
on  a  classification,  the  underlying  scheme  is  unambiguous. 

4  Properties  of  Address  and  Value 
Sequences 

To  simplify  discussing  computer  families,  it  is  convenient 
to  adopt  a  simple  abbreviation.  The  expression 


^dtdj^djd« 

will  be  abbreviated  by  the  string 


Thus,  the  von  Neumann  machine  class  is  abbreviated  ssss. 
while  the  MPP  is  in  sssm.  String  expressions  will  be  used 
as  shorthand  to  abbreviate  several  classes. 

There  are  several  important  properties  of  this  taxo¬ 
nomic  system  which  influence  the  kind  of  machine  classes 
definable. 

Proposition  Si  Any  machine  IavDcA/ satisfies  the  in¬ 
equalities: 


a  <  v  and  o'  <  v' 

These  inequalities  follow  from  the  fact  that  in  a  reference 
stream  every  address  is  paired  with  at  least  one  value,  so 
the  width  of  the  address-stream  is  a  lower  limit  to  the  width 
of  the  corresponding  value  stream.  The  interpretation  of 
these  inequalities  seems  intuitively  correct:  The  number 
of  addresses  presented  to  memory  should  never  exceed  the 
number  of  values  returned.  As  a  corollary,  any  nonempty 
machine  class  will  satisfy  these  relationships,  where  the 
definition  of  the  relation  is  suitably  extended. 

Convention  4-  Any  machine  IavD<fi/  will  satisfy  the 
inequality: 

v  <  v' 

Unlike  the  preceding  propositions  which  are  artifacts  of 
the  taxonomy’s  abstraction,  this  convention  is  adopted  pri¬ 
marily  for  semantic  conaistancy.  Its  interpretation  is  that 
the  number  of  instructions  being  interpreted  should  not 
exceed  the  data  available. 

Since  it  is  a  convention,  it  is  open  to  debate.  On  the 
positive  side  the  convention  helps  avoid  “problem'  ma¬ 
chines  like  Flynn’s  MISD;  this  machine  doesn’t  make  much 
sense  and  has  often  been  criticized.  Here,  the  convention  is 
worthwhile,  considering  that  the  finer  control  of  this  tax¬ 
onomy  permits  greater  opportunities  to  create  such  du¬ 
bious  classes.  On  the  negative  side,  adopting  the  conven¬ 
tion  might  prevent  accurately  describing  certain  machines, 
though  none  has  come  to  the  author's  attention.  Since  a 
taxonomy  is  descriptive  (as  opposed  to  being  prescriptive) 
and  given  that  architects  are  not  likely  to  have  their  cre¬ 
ativity  constrained  by  this  convention,  we  adopt  it. 


5  More  Machine  Classes 


The  efficacy  of  a  classificational  system  usually  depends  to 
some  degree  on  interpretation.  (It  always  does  in  biological 
taxonomy.)  Usually  there  is  a  large  range  (sometimes  a 
continuum)  of  values  that  a  property  can  assume,  and  we 
wish  to  assign  certain  segments  of  this  range  to  different 
classes.  But  there  may  not  be  any  effective  way  to  identify 
the  boundaries  of  these  ranges,  and  so  membership  is  often 
a  matter  of  judgement.  This  characteristic  will  persist  for 
this  taxonomy,  but  confusion  can  be  minimized  by  beir.c 
somewhat  more  precise  about  the  terminology  that  we've 
already  used. 

Define  the  class  designators  as  follows: 


es 


i 


•  5  is  the  predicate  “equals  1”, 

•  c  is  the  predicate  “from  1  to  some  (small)  constant", 
and 

•  m  is  the  predicate  “from  1  to  an  arbitrarily  large 
finite  number”. 

Though  the  c  and  m  designators  have  no  upper  limit  in 
principle,  they  are  intended  to  convey  two  different  mean¬ 
ings.  When  the  e  designation  is  used  the  range  has  a  hard 
upper  limit  usually  due  to  internal  constraints  in  the  ar¬ 
chitecture  and  cannot  be  easily  increased  by  a  substantial 
amount.  An  example  might  be  the  number  of  instructions 
that  can  be  packed  in  the  instruction  word  of  a  VLIW 
machine{8];  for  any  given  word  size  it  is  fixed,  and  even 
though  the  word  size  can  be  increased  this  is  probably  not 
the  intended  nor  the  rational  way  to  generalized  the  given 
machine.  The  m  designation,  however,  is  used  when  the 
quantity  can  be  easily  generalized  or  scaled.  An  example 
is  when  additional  PEs  can  be  added  as  with  the  MPP. 

These  distinctions  are  not  always  clear,  of  course,  and 
judgement  must  be  applied.  An  example  is  the  question 
of  how  to  classify  a  machine  with  processors  connected 
to  a  bus  [9],  In  principle,  there  is  no  limit  to  how  many 
processors  can  be  attached  to  a  bus,  but  with  the  addition 
of  each  processor  the  congestion  increases,  and  this  is  an 
internal  constraint  reducing  the  performance.  Is  this  a  “c” 
or  “m"  case?  Arguments  can  be  made  on  both  sides;  we 
leave  the  question  open  for  the  moment. 

It  is  now  possible,  using  the  class  designators.  Propo¬ 
sition  3  and  the  convention  to  define  a  number  of  machine 
classes.  Notice  that  there  is  no  attempt  to  be  complete  in 
either  defining  classes  or  categorizing  machines: 

haDaa  von  Neumann  machines. 

haDac  “packed"  von  Neumaan[VO];  the  machine  can 

fetch  several  distinct  daU  values  from  fixed 
postions  from  one  address  and  simultaneously 
apply  the  same  operation  to  them.  Many  ma¬ 
chines  have  some  instructions  of  this  form,  e.g. 
performing  2  half  word  adds  on  the  word  at  a 
given  address;  all  (ALU)  instructions  for  a  ma¬ 
chine  in  this  class  would  have  this  capability. 
Isa  D»m  SIMD  Parallel  Machines  with  no  addressabil¬ 
ity,  such  as  the  MPP,  the  Connection  Machine 
1  [11]  and  systolic  arrays  [12]. 
ha  Dee  SIMD  Multigauge  machines  [13];  these  are 

von  Neumann  mx-hm—  which  can  (option¬ 
ally)  split  their  datapath  to  process  multiple, 
independent  operand  streams  at  once. 
laaDmm  Addressable  SIMD  Parallel  Machines,  such  as 
Illiac  IV  and  the  CM2  [14]. 

he  Dec  VLIW  Machines  [8];  the  machine  fetches  and 
executes  several  instructions  stored  in  one  in¬ 
struction  address. 

Ice  Dec  MIMD  Multigauge  machines  [13];  these  are 
von  Neumann  machines  which  can  (option¬ 
ally)  split  their  fetch/execute  cycle  to  pro¬ 


cess  multiple,  independent  instructions  con¬ 
currently. 

ImmDmm  MIMD  Parallel  Machines,  including  machines 
such  as  the  Ultracomputer[15]  and  the  Cosmic 
Cube[16j. 

Clearly,  the  list  is  not  complete  in  terms  of  either  the 
classes  listed  or  the  machines  recognized  as  members  of 
any  given  class.  Much  work  remains. 

6  Discussion  of  the  Taxonomy  and 
The  Origins  of  Synchronous  Com 
putation 

One  is  struck  by  at  least  two  aspects  of  the  foregoing  clas¬ 
sification:  A  large  and  diverse  set  of  machines  are  lumped 
into  the  last  classification,  mmmm,  and  nowhere  in  the  tax¬ 
onomy  has  the  synchronous  requirement  been  mentioned, 
except  in  the  paper’s  title.  These  two  observations  are 
related. 

In  effect  the  taxonomy  uses  as  its  “criterion  for  classi¬ 
fication"  the  number  of  repeated  instances  of  the  principle 
functional  activities  of  the  fetch/execute  cycle.  So,  ma¬ 
chines  are  distinguished  by  how  many  instructions  they 
can  decode  at  once  or  how  many  operations  on  separate 
data  they  can  perform  simultaneously.  But  these  are  not 
the  characteristics  we  think  of  as  distinguishing  the  differ¬ 
ent  MIMD  parallel  computers.  Rather,  we  think  of  them 
as  being  different  depending  on  whether  or  not  they  have 
global  shared  memory  or  what  their  interconnection  topol¬ 
ogy  is.  These  are  features  unrelated  to  the  fetch/execute 
cycle.  So,  lumping  MIMD  parallel  computers  in  the  mmmm 
class  says  only  that  by  the  criterion  applied,  they  are  all 
equivalent. 

This  is  unsurprising  and  is  not  evidence  of  weakness 
in  the  classification.  Indeed,  it  might  point  to  why  efforts 
to  find  criteria  suitable  for  classifying  all  parallel  machines 
have  so  far  been  unsuccessful:  Qualities  that  are  important 
for  some  computers  are  for  other  computers,  unimportant, 
irrelevant  or  even  misleading  as  a  guide  to  classification. 
To  the  extent  that  the  taxonomy  provides  insight  by  its 
classifications,  the  “criterion"  amounts  to  being  a  useful 
way  of  looking  at  some  computers.  (For  cases  where  topol¬ 
ogy  matters  for  sychronous  machines,  e.g.  between  the 
MPP  and  the  CMl,  see  the  next  section.) 

Interestingly,  the  “criterion"  apparently  mandates  the 
synchronous  property.  By  using  “the  number  of  repeated 
instances  of  the  principle  functional  activities  of  the  fetch/ 
execute  cycle"  as  the  basis  for  classification,  we  are  think¬ 
ing  of  machines  as  either  a  single  f/e  cycle  that  has  certain 
components  replicated,  or  multiple  copies  of  a  f/e  eyrie. 
In  the  former  case  (all  legal  machines  of  the  form  sy.  y 
i  {s,c,m}3)  synchronous  execution  is  mandated  because 
there  is  only  one  cycle  running.  With  the  current  structure 
of  the  taxonomy  this  leaves  only  the  cy  classes  and  mmmm. 
Though  there  is  no  requirement  in  the  model  that  these  be 
synchronous,  the  class  designators  provide  some  basis  for 
deciding:  The  c  designation  carries  with  it  the  implication 
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